加强学习(RL)在学术界和技术产业中获得了越来越多的吸引力,并推出了各种各样的有影响力的应用和产品。虽然研究正在积极地在许多方面进行(例如,离线RL,性能等),但许多RL从业者面临着基本忽略的挑战:确定设计的马尔可夫决策过程(MDP)是否有效和有意义。本研究提出了一种基于启发式的特征分析方法来验证MDP是否合理。我们认为,适合应用RL的MDP应包含一组状态特征,这些功能对动作和预测性依赖于奖励。我们在构造的环境中测试了我们的方法,表明我们的方法可以识别某些无效的环境制定。据我们所知,对RL问题配方进行有效性分析是一种新颖的方向。我们设想,我们的工具将作为一个动机示例,以帮助从业者更容易地将RL应用于现实世界问题。
translated by 谷歌翻译
与常规深层神经网络(DNN)相比,衍射光学神经网络(DONNS)在功率效率,并行性和计算速度方面具有显着优势,因此引起了很多关注,这些神经网络(DNN)在数字平台上实现时具有内在的限制。但是,反相反的算法训练的物理模型参数上具有离散值的现实世界光学设备是一个非平凡的任务,因为现有的光学设备具有非统一的离散级别和非单调属性。这项工作提出了一个新颖的设备对系统硬件软件代码框架,该框架可以对Donns W.R.T的有效物理意识培训进行跨层的任意实验测量的光学设备。具体而言,使用Gumbel-SoftMax来启用从现实世界设备参数的可区分映射到Donns的正向函数,在Donn中,Donn中的物理参数可以通过简单地最小化ML任务的损耗函数来训练。结果表明,我们提出的框架比传统的基于基于量化的方法具有显着优势,尤其是使用低精确的光学设备。最后,在低精度设置中,通过物理实验光学系统对所提出的算法进行了充分的验证。
translated by 谷歌翻译
在自主驾驶的复杂情况下,培训多个代理商以进行安全和合作的控制是一个挑战。对于一小群汽车,本文提出了麻木,这是一种培训多个代理商的新方法。 Lepus采用了一种纯粹的合作方式来培训多个代理,以策略网络的共享参数和多个代理的共享奖励函数为特色。特别是,Lepus通过对抗过程预先培训政策网络,提高其协作决策能力并进一步促进汽车驾驶的稳定性。此外,由于减轻了稀疏奖励的问题,Lepus通过结合随机网络和蒸馏网络从专家轨迹中学习了近似奖励功能。我们在Madras模拟平台上进行了广泛的实验。实验结果表明,通过麻法训练的多种代理可以避免同时驾驶时尽可能多的碰撞并超越其他四种方法,即DDPG-FDE,PSDDPG,MADDPG和MAGAIL和MAGAIL(DDPG)(DDPG)在稳定性方面。
translated by 谷歌翻译
实体集扩展(ESE)是一项有价值的任务,旨在找到给定种子实体描述的目标语义类别的实体。由于其发现知识的能力,各种NLP和下游应用程序都受益于ESE。尽管现有的引导方法取得了巨大进展,但其中大多数仍然依赖手动预定义的上下文模式。预定义的上下文模式的不可忽略的缺点是,它们不能灵活地推广到各种语义类别,我们将这种现象称为“语义敏感性”。为了解决这个问题,我们设计了一个上下文模式生成模块,该模块利用自回归语言模型(例如GPT-2)自动为实体生成高质量的上下文模式。此外,我们提出了GAPA,这是一种新型ESE框架,利用上述生成的模式扩展目标实体。对三个广泛使用的数据集进行了广泛的实验和详细分析,证明了我们方法的有效性。我们实验的所有代码都将用于可重复性。
translated by 谷歌翻译
多层erceptron(MLP),作为出现的第一个神经网络结构,是一个大的击中。但是由硬件计算能力和数据集的大小限制,它一旦沉没了数十年。在此期间,我们目睹了从手动特征提取到带有局部接收领域的CNN的范式转变,以及基于自我关注机制的全球接收领域的变换。今年(2021年),随着MLP混合器的推出,MLP已重新进入敏捷,并吸引了计算机视觉界的广泛研究。与传统的MLP进行比较,它变得更深,但改变了完全扁平化以补丁平整的输入。鉴于其高性能和较少的需求对视觉特定的感应偏见,但社区无法帮助奇迹,将MLP,最简单的结构与全球接受领域,但没有关注,成为一个新的电脑视觉范式吗?为了回答这个问题,本调查旨在全面概述视觉深层MLP模型的最新发展。具体而言,我们从微妙的子模块设计到全局网络结构,我们审查了这些视觉深度MLP。我们比较了不同网络设计的接收领域,计算复杂性和其他特性,以便清楚地了解MLP的开发路径。调查表明,MLPS的分辨率灵敏度和计算密度仍未得到解决,纯MLP逐渐发展朝向CNN样。我们建议,目前的数据量和计算能力尚未准备好接受纯的MLP,并且人工视觉指导仍然很重要。最后,我们提供了开放的研究方向和可能的未来作品的分析。我们希望这项努力能够点燃社区的进一步兴趣,并鼓励目前为神经网络进行更好的视觉量身定制设计。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译